|
|||
Home | TRIM3 Navigator | Documentation | |||
Statistical Match Procedure Used in the 2007-2009 BaselinesThe statistical match procedure used in the 2007-2009 baselines is an unconstrained nearest neighbor match similar to that used in 2005-2006, but including additional improvements to the matching methodology that change the block groups and minimum distance function. Prior to matching, the CPS and PUF are divided into mutually exclusive groups that only allow matching within each respective group. The groups are defined by the following "blocking variables":
Several additional constraints are imposed on the matching algorithm that have the effect of reducing the number of PUF records that are potential matches to a particular TRIM3 record. These constraints relate to:
The 2007 baseline introduced a new "minimum distance" function for matching tax units, depending on whether each unit had a top-coded income amount. It also continued the practice of using the PUF to restore variation to top-coded CPS incomes. During this era of the CPS, the Census Bureau top-coded income amounts exceeding certain thresholds in order to preserve confidentiality, and replaced top-coded amounts with averages calculated for all top-coded individuals. The replacement values used for earned income variables varied by gender, race/ethnicity, and whether the person worked full-time for the full-year. One goal of the statistical match was to increase variation in income amounts over the threshold, allowing for more precise calculation of taxes. Once the match procedure has identified the set of PUF records that can be matched to a given CPS tax unit, a PUF record is selected using a "minimum distance" function. This procedure varies between units that are "high income" (that is, with one or more income amounts above the top-coding threshold--see below) and lower income units. If the tax unit is not treated as a "high income" unit for the purpose of the match, then the distance function is computed based on AGI. Capital gains and IRA and Keogh contributions are obtained from the PUF record being considered for the match. The capital gains are added (and IRA and Keogh contributions are subtracted) from the preliminary AGI calculated by TRIM3. The resulting AGI is compared to the AGI of the available PUF records and the record with the least difference in AGI is selected. The statistical match restores variation to the following top-coded CPS income variables: wages, business income, farm income, interest, pensions, a combined variable representing dividends, estates, and trusts, and a combined variable representing rents and royalties. If a tax-unit has a top-coded value for one or more of these variables (meaning it is a "high income" unit), the minimum distance function is computed by examining the difference between the CPS tax unit and the PUF record for each of ten income items reported on both the CPS and the PUF (wages, business income, farm income, interest, pensions, dividends/estates/trusts, rents/royalties, total social security benefits, unemployment compensation, and alimony received). However, for each top-coded income source, the matching algorithm:
Once a PUF record has been selected, variables from that record are assigned to the CPS tax unit. The weight of the PUF record is then reduced by the weight of the CPS tax unit. Once the weight for a PUF record has been reduced to zero, it cannot be matched to additional CPS tax units. When a PUF record is matched to a top-coded tax unit, we replace the CPS income amount (for any top-coded variables) with the amount obtained from the PUF record, restoring variation to the top-coded income variable. The modified income variables do not overwrite the CPS income variables stored in the TRIM3 database, but are stored as a set of alternative income variables for use as input to the Federal Tax simulation. In particular, the baselines labeled "highinc" use the PUF income values, while the regular baselines use the CPS income values. Because the variables obtained through the statistical match for an individual tax unit are obtained from a single PUF record, we are limited in our ability to align any specific variable to target. However, we do make some adjustments. We adjust the capital gains and deduction dollar amounts to reflect the change in average dollar amounts between the year of the PUF data and the tax year being simulated, and we make minor adjustments to increase or decrease the likelihood of selecting a PUF record based on whether the record has income or deduction values from particular sources (such as capital gains). We also perform some minimal alignment by adjusting the dollar amounts used to disallow matches to PUF records with very large income or deduction amounts. The 2007-2009 baselines used the 2006 PUF. |
|||